Debug on M-CORD provision failed issue

Find error here

  • Task [maas-provision : Wait for node to be fully provisioned]

SHELL CMD: /usr/local/bin/get-node-prov-stat => status “1”

if complete provisioning, the output will be 2

Dig into the detail

In log /etc/maas/ansible/logs/node-.log

Find out the problem - juju-compute-setup

Dig into detail by using

status --format
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---

![](https://i.imgur.com/UHJOVXS.png)

shows nova-compute failed and hook failed "install"

Get into compute node to see the detail and test by

`sudo apt-get install traceroute`

![](https://i.imgur.com/Q5XEBdb.png)

Seems like python package engine has some failure.

APT service depends on dpkg, and dpkg had error at nova-compute (libvirt) user setting.

Go into /var/log/nova/nova-compute.log

We had a problem when we have the operation '_build_resource' at nova/compute/manager.py#243, and the stack show the tool it use:
- nova
- glance-client
- requests
- urllib3

## bug fixed

- /usr/lib/python2.7/dist-packages/nova/
- /usr/lib/python2.7/dist-packages/oslo_concurrency/
- /usr/lib/python2.7/dist-packages/glanceclient/
- /usr/lib/python2.7/dist-packages/requests/
- /usr/local/lib/python2.7/dist-packages/urllib3/

In python, it will search the package library by the following way:
- the path of the program
- System path
- export PYTHONPATH

for urllib3, we have two conflict package version at a single compute node !

```python=
ubuntu@nasty-ground:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib3
>>> urllib3
<module 'urllib3' from '/usr/local/lib/python2.7/dist-packages/urllib3/__init__.pyc'>
>>> urllib3.__version__
'1.24'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ubuntu@nasty-ground:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages']
>>> sys.path.remove('/usr/local/lib/python2.7/dist-packages')
>>> sys.path
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/lib/python2.7/dist-packages']
>>> import urllib3
>>> urllib3
<module 'urllib3' from '/usr/lib/python2.7/dist-packages/urllib3/__init__.pyc'>
>>> urllib3.__version__
'1.7.1'
>>>

sys.path has the priority, so the urllib3 at /usr/local/lib/python/dist-packages will be included first.

In urllib3 1.7.1 didn’t have the dictionary Key_fn_by_scheme, the different implementation cause the issue.

solution is to remove the 1.24 version

sudo pip uninstall urllib3

Check version again

1
2
3
4
5
6
>>> import urllib3
>>> urllib3
<module 'urllib3' from '/usr/lib/python2.7/dist-packages/urllib3/__init__.pyc'>
>>> urllib3.__version__
'1.7.1'
>>>

restart nova-compute service

sudo service nova-compute restart # this step can skip

Get back into head, type

status --format
1
2
3
4
5
6
7
8
9
10
11
12
13

![](https://i.imgur.com/hemhFzj.png)

Manually fixed the issue

`juju resolved --retry nova-compute/0`
---

Wait for around 10 minute to get this result

![](https://i.imgur.com/TVta7KV.png)

Check by using the same command ```juju status --format=tabular

Since our compute node has already provisioned, modify the playbook here.

  • ~/cord/build/ansible/roles/maas-provision/tasks/main.yml

status 6 means provisioned

1
2
3
4
5
6
7
8
- name: Wait for node to become ready                                                                                                                                                  │······
shell: maas cord nodes list|jq -r '.[] | select(.status == 0 or .status == 6).system_id' │······
register: nodeid │······
until: nodeid.stdout │······
retries: 40 │······
delay: 15 │······
tags: │······
- skip_ansible_lint

rebuild by the following command

1
2
3
4
5
6
7
8
9
10
11
12
# tear down and clean profile
make xos-teardown; make clean-openstack; make clean-onos;
rm milestones/cord-config; rm milestones/copy-co*;
make clean-profile;
cd genconfig/ && rm -rf config.* cord_* inventory.ini;

# you can update config or tosca now


# rebuild
cd ../ && make config PODCONFIG=mcord-oai-virtual.yml;
make -j4 build; make compute-node-refresh; make mcord-oai-test;

Result

  • original task observation

  • nova output

  • XOS output

Root case

This is an issue with how Python packages are installed and upstream in urllib3/requests.

https://github.com/urllib3/urllib3/issues/1456

lots of developer facing the same problem from the new release urllib3